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Abstract. - I consider a class of fitness landscapes, in which the fitness is a function of a 
finite number of phenotypic "traits", which are themselves linear functions of the genotype. 
I show that the stationary trait distribution in such a landscape can be explicitly evaluated 
in a suitably defined "thermodynamic limit", which is a combination of infinite-genome and 
strong selection limits. These considerations can be applied in particular to identify relevant 
features of the evolution of promoter binding sites, in spite of the shortness of the corresponding 
sequences. 



The quasispecies (QS) model is extremely useful to investigate the behavior of pop- 
ulations evolving in a given fitness landscape H , although it is based on a rather unrealistic 
infinite-population approximation. It leads to the QS equation, which is a deterministic evolu- 
tion equation for the fraction of individuals in the population carrying a given genotype. The 
dimensionality of the QS equation is in principle equal to the number of possible genotypes — 
an enormously large number even for the smallest organism. Most analytical treatments of 
the QS equation have therefore focused on situations where this number could be reduced 
by lumping together genotypes in a small number of classes. In some "master-sequence" 
landscapes, where fitness depends only on the Hamming distance from a given, optimal geno- 
type J2|, |j, it is possible to treat together all genotypes whose Hamming distance from the 
master sequence is the same, forming what is called an error class. The QS equation can be 
projected on error classes, yielding a vast dimensionality reduction. 

However this simplification is not warranted in a number of interesting cases. Even in 
master sequence landscapes, the fitness can be a function not only of the number of mutations 
away from the peak, but also of where they appear: some sites in the sequence can be more 
important than others. On the other hand, there can be more than one fitness peak, quite 
unrelated to one another. 

There have recently been a number of attempts to describe evolution in a low-dimensional 
space of quantitative traits, in a so-called phenotypic approach In this case the QS 
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equation is low-dimensional from the outset, but the mutation model is more or less arbitrary. 
In particular, one loses the fact that different phenotypes can be expressed by greatly varying 
numbers of genotypes, and that equilibrium may be reached from a balance between fitness 
and mutational load. 

In the present letter I show how the gap between the genotypic approach of the original 
QS model and the phenomenological phenotypic approach can be bridged for a class of fitness 
landscapes that I shall call the general mean-field landscapes. The main assumption is that 
the fitness is a function of a finite number of "phenotypical traits" , which are themselves linear 
functions of the genotypical sequence. However, the dependence of the fitness on the traits 
is more or less arbitrary. Special cases of these landscapes are the single-peak landscape, the 
Hopfield |],|7| landscape considered by Leuthausser ||, and the "mesa" landscape introduced 
by Gerland and Hwa |J to model the evolution of DNA binding motifs. I shall describe 
the evolutionary dynamics in the "thermodynamic limit" introduced in refs. flio|,[il||, which is 
close in spirit to the strong selection limit considered by Krug fl^| to treat the transient in 
quasispecies evolution as a form of extremal dynamics. 

I shall discuss my approach within a simple two-letter alphabet representation for the 
genotype. I defer to a further publication the generalization to the four-letter alphabet and a 
discussion of a number of more realistic fitness landscapes. I first define the QS equation and 
the general mean-field fitness landscapes. I then show how the solution of the QS equation 
can be reduced to an extremal problem within the thermodynamic limit and via an additional 
slow change assumption. I then discuss the consequences of this approach in some interesting 
cases describing interesting evolutionary phenomena. The validity of my approximations is 
briefly discussed at the end. 

I consider a very large population of individuals evolving according to a Darwinian (repro- 
duction-mutation-selection) mechanism, with a one-parent (asexual) reproduction and with 
a simple point-mutation model of nucleotide substitution. I assume that the "genotype" is 
described by sequences of L binary units, a = (ai), i — 1, . . . ,L, at — ±1. These sequences 
may describe, e.g., a short segment of the genome, corresponding to a binding motif, as 
in IJ]. I also assume non-overlapping generations, and define the fitness weight W a > to be 
proportional to the expected number of offspring of an individual carrying the genotype a. 
In the infinite-population limit, the fraction x a (t) of individuals carrying the genotype a at 
generation t obeys the QS equation 



where (W) t — W a x a (t) is the average fitness weight and Q = (Qaa 1 ) is the mutation ma- 
trix. Within a simple independent mutation model with mutation probability per generation 
equal to \ii at unit i one has H 



where fa = \ log 1 - l) . 

In a general mean-field landscape, the fitness weight W a depends on the genotype a only 
via a finite number of linear "traits" to", a = 1, . . . ,p, defined by 




(1) 






(3) 
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as a function of the vectors £ Q = One can assume in the following either that these 

vectors are known, or that their components are independent random variables. Our results 
will be expressed as averages over the distribution of the components of the £ Q in both cases. 
The fitness weight W a then assumes the form 

W„ oc exp (L K0(m (T )) , (4) 

where n is a selection coefficient, m a = (m") and <p( ■ ) is a rather arbitrary function of its 
argument. Special cases include: 

The master-sequence landscape: Here p — 1 (m is a scalar) and <fi(m) is, say, a monoton- 
ically increasing function of m. The master sequence corresponds to c, = sign£*. The 
usual sharp-peak landscape corresponds to = ±1 and <p(m) equal to 1 if m = 1 and to 
otherwise. One can consider in general <fi(m) = m A , where A is an epistasis parameter 
(no epistasis for A = 1, positive epistasis for A > 1, etc.). The "mesa" landscape || 
corresponds to 4>{m) — (1 + exp (-A(m-mo))) 1 where A can be taken to 

infinity. 

The Hopfield landscape: Here p > 1 (but finite), and 4>{m) is a function of the p-dimen- 
sional vector rh = (m Q ). In the original Hopfield model one has (f>(rh) = i ^2 a (m a ) , but 
<^ can be more general. However, even if <p(m) = Y] a cj) a (m a ), the different components 
of rfi are not independent, since adaptation in one component may disrupt adaptation 
in the other. For example, one may consider a sequence which should exhibit an affinity 
larger than some threshold for a given factor, and lower than another threshold for 
another one, as in the "molecular ecology" experiments of Ordoukhanian and Joyce ]l5| ], 
in which a Class I Ligase ribozyme is made to evolve in the presence of a 10-23 DNA 
enzyme which binds to the same subsequence as the substrate. 

The Royal Road landscape: This landscape is rather popular in the theory of Genetic 
Algorithms, because it embodies neutrality and adaptation in a simple way JT6| . The 
genotype of length L = BK is divided into K blocks of length B each. For each block a, 
m a is defined by m a = B^ 1 J^ieB £,? a ii where B a is the set of units which belong to 
block a. The difference with the Hopfield landscapes is that the blocks do not overlap. 
If 4> is additive with respect to the blocks, the evolution of each block is independent of 
the other in the quasispecies model. The most interesting results are therefore obtained 
when there is epistatic interaction between the different blocks. 

The solution of the QS equation can be expressed as a "functional integral" . One defines 
x <r(t) = y<r(t)/ J2cr' J/o 7 ^)) where the y^'s follow the linear QS equation: 

y*(t + l) = Y,Q™' Wa'Va'it). (5) 

a' 

One can then write 

y a (t+ 1) = y a (t+\) = X!" 'X! exp \ X] ( ^2^( T + 1 ) cr ^ T ) + Ln(j)(rn a{T) ) J \ y a{a) . (6) 

<r(t) ct(0) lr=0 / J 

By using the Fourier representation of the delta function, this expression can be written 

y a(t +i) oc j n ^™( r )-^^ ex p jx] l ( k ^(™( t )) - ^( r ) • ™( t )) I 
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x n \ e ■ ■ ■ e ex p 

1=1 I (T,(t) CTi(O) 



£ ]T (a^(t + l)o-i(r) + A(r) ■ ^(r) 



.T=0 1=1 



2/«r(o)- (7) 



Here we have set, e.g., X ■ Ci — J2 a ^ a £f- ^ we neglect multiple-spin correlation in the initial 
condition, the second line can be written 

V)=k jf[ [mrX(r) m »*(0)} , (8) 



■=1»,(0) kr=0 



where K(/3, /i) = (exp(/3crer' + /icr')) is the transfer matrix of a ID Ising model. 
If A(r) and m(r) are "slowly varying" one has 

t 

[] K(A, A(r) • ^) ~ A(r) • £) ^ Ca'xtft A(r) • &)P max (A, A(r) • £), (9) 

r=t' 

where K maK ((3i,h) is the larger eigenvalue of K((3,h), and P ma x(/3, ^) the projector on the 
corresponding eigenvector. 
Define 

^2 <5 (£m - Lm a ) y„{t) = exp (LF(m, t)) . (10) 

Then, if in the initial condition y a depends on a only via to, and within the slow change 
approximation, 

exp (LF(m, t + 1)) cx f JJ dm(r) / [| exp j ^ L («^(m(r)) - A(r) • m(r)) 1 



exp^i^ln^ max (/3,A(r)-O+iF(m(0),0) L (11) 



T = 



Here lni^ max (/3, A • £) = .F(A(t)) is the average of lnJC max (/3, A ■ £) with respect to the dis- 
tribution of the /3's and of the £'s. This can be either the average over the actual, known 
distribution of these quantities, if one is lucky enough to know it; or over some reasonable a 
priori distribution otherwise. 

If L is large enough, this expression can be evaluated by the saddle point method. In 
particular, the error threshold can be identified at stationarity, by looking at the extremum of 
the function K(f)(fn) — A • rfi + .F(A) with respect to (m, A). This corresponds to the extremum 
to* of K(j>(m) + r(m), where T(m) is the Legendre transform of ^(A) with respect to AjQ] 
If the distribution of the £'s is symmetrical, the maximum of F(m) is located at the origin. 
As P increases (i.e., as the mutation rate gets smaller), T(m) becomes flatter and flatter. 
As [3 becomes larger than a threshold value /3th, m* moves away from the origin: this is 
the error threshold: see fig. |l| (left). Within a simple "mesa" landscape, the error threshold 
is approximately located at the point in which k^(too) + r(m ) becomes larger than T(0). 
For P > /?th, the optimum m* remains close to too, except (for finite A) at extremely small 
mutation rates, in agreement with the results of ref. ||[l7]. 

( 1 )This value is not equal to the actual average value of m in the stationary population jij, but is close enough 
to it if the mutation rate is small. 
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Fig. 1 - Left: The error threshold for the p = 1 case in the mesa landscape 0(m) = 8(m — mo). 
The function K<f)(m) + r(m) is plotted vs. m for (from bottom up) f3 = 2.25, 2.3425, 2.45. We have 
set k — 0.005 and mo = 8/9. One observes that the maximum moves from the origin to mo for 
(3 = /3 t h = 2.3425. Right: Finite-size scaling m = m(/3, L) for k = 0.005 and mo = 8/9. X axis: 
{P - p th )L 2/3 with /3 th = 2.3425. From left to right: L = 8, 16, 32, 64, 128. 



In order to analyze the finite-length behavior of the system, it is useful to apply the 
Schrodinger equation approach to the quasispecies equation |9| [l8| . The role of the quantum 
constant h is played by the inverse of the genome length L. In the simple mesa landscape, the 
quasispecies equation can be transformed into a Schrodinger equation in a potential which 
is the superposition of a harmonic oscillator potential near m = and a linear potential 
with a barrier near the threshold. The finite- length threshold 0th{L) can be identified by the 
condition that the ground-state energies near the two classical minima are equal. One thus 
obtains the result that 0\i{L) reaches its asymptotic value as /3th(£) = /?th(oo) + 0(L -2 / 3 ), 
rather than OfX^ 1 ), which holds for the sharp-peak landscape (cf. ref. or for smoother 
ones. The width of the distribution in m above the threshold also behaves like L~ 2 / 3 . The 
transition is however quite sharp even for moderate values of L, as can be seen by the finite- 
scaling analysis shown in fig. [j] (right). 

I discuss the Hopfield landscape in the didactically simple case of p = 2, £™ = ±1. One 
can easily evaluate T(rn) numerically: it exhibits a maximum at rh = 0, and is higher on the 
axes. Again, it becomes natter and flatter as (3 increases. Let us consider the case in which 
K(f)(rh) = Kx6(m — ttiq) + K20{m 2 — mfy. One can identify the error thresholds {0^,0%^ 
on the two axes, and the actual threshold will take place at the smaller /3 t h- However, there 
might be a second threshold at a larger value of /3, where m* moves from one axis to another, 
and even a third one, where rfi* acquires two nonzero components. See fig. ||. 

In the Royal Road case, each block will have its own T(m) function. If the fitness function 
4>(rh) is a sum of contributions, one for each block, the stationary point will be determined 
independently for each component as the optimum of K a (j) a {m a ) + F(m Q ). On the other hand, 
if there is epistatic interaction among blocks, appearing in 0(m), one can find a more complex 
phase diagram. 

Summarizing, I have shown how it is possible to solve for the stationary behavior of the 
quasispecies equation in a number of nontrivial fitness landscapes, provided the "thermody- 
namic" and the slow change limits are taken. The thermodynamic limit seems far-fetched if 
one is considering, as in ||, the evolution of binding motifs. Nevertheless, the error threshold 
is well identified by the present approach, and the basic conclusion that m* is close to the 
threshold follows directly. The transition appears to be first-order in our language, since it 
corresponds to the "bulk" transition, while in ||] it is described as the corresponding wetting 
transition near a wall (cf. (pf). The slow change limit applies in the stationary regime, and 
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Fig. 2 - The phase diagram in the ((5, K2) plane for the p=2 Hopfield landscape, with K(f>(m) = 
Ki^m 1 - 0.33) + K 2 6(m 2 - 0.5), and m = 0.377417 10~ 4 . The letters denote the stability regions for 
the points O: m* = (0,0), A: m* = (0.33,0), B: m* = (0,0.5), and C: m* = (0.33,0.5). 

can also be valid in the transient regime if the mutation rates are not too small: the condition 
is that the number of generations needed to equilibrate with a given value of A should be 
smaller than the number of generations in which A itself varies significantly. This is true if the 
selective pressures are not too large, and the mutation rates not too small. The application 
of the present approach to the dynamics is a problem worth further investigation. 
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